Off-Line Dictionary-Based Compression The dictionary-based compression methods

ثبت نشده
چکیده

The dictionary-based compression methods described in Chapter 3 of the book are different, but have one thing in common; they generate the dictionary as they go along, reading data and compressing it. The dictionary is not included in the compressed file and is generated by the decoder in lockstep with the encoder. Thus, such methods can be termed " online. " In contrast, the methods described here are also dictionary based, but can be considered " offline " because they include the dictionary in the compressed file. The first method is byte pair encoding (BPE). This is a simple compression method, due to [Gage 94], that often features only mediocre performance. It is described here because (1) it is an example of a multipass method (two-pass compression algorithms are common, but multipasses are normally considered too slow) and (2) it eliminates only certain types of redundancy and should therefore be applied only to data files that feature this redundancy. (The second method, by [Larsson and Moffat 00], does not suffer from these restrictions and is much more efficient.) BPE is both an example of an offline dictionary-based compression algorithm and a simple example (perhaps the simplest) of a grammar-based compression method. In addition, the BPE decoder is very small, which makes it an ideal candidate for applications where memory size is restricted. The BPE method is easy to understand. We assume that the data symbols are bytes and we use the term bigram for a pair of consecutive bytes. Each pass locates the most-common bigram and replaces it with an unused byte value. Thus, the method performs best on files that have many unused byte values, and one aim of this document is to show what types of data feature this kind of redundancy. First, however, a small example. Given the character set A, B, C, D, X, and Y and the data file ABABCABCD (where X and Y are unused bytes), the first pass identifies the pair AB as the most-common bigram and replaces each of its three occurrences with the single byte X. The result is XXCXCD. The second pass identifies the pair XC as the most-common bigram and replaces each of its two occurrences with the single byte Y. The result is XYYD, where every bigram occurs just once. Bigrams that occur just once can also be replaced, if more unused byte values are available. However, each …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Compression Using a Dictionary of Patterns

Most modern lossless data compression techniques used today, are based in dictionaries. If some string of data being compressed matches a portion previously seen, then such string is included in the dictionary and its reference is included every time it occurs. A possible generalization of this scheme is to consider not only strings made of consecutive symbols, but more general patterns with ga...

متن کامل

DNA Sequence Compression Using the Burrows-Wheeler Transform

We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed...

متن کامل

JBIG2 Symbol Dictionary Design Based on Minimum Spanning Trees

The JBIG2 standard is a very flexible bi-level image coding strategy based on pattern matching. The encoder collects a set of symbols in a dictionary and encodes a page by reference to the dictionary symbols. JBIG2 allows the encoder to view all symbols and choose a good set for the dictionary. In this paper, we examine the bit rate trade-off that arises in choosing different dictionary sizes. ...

متن کامل

Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches

With the widening gap between processor and memory speeds, memory system designers may find cache compression beneficial to increase cache capacity and reduce off-chip bandwidth. Most hardware compression algorithms fall into the dictionary-based category, which depend on building a dictionary and using its entries to encode repeated data values. Such algorithms are effective in compressing lar...

متن کامل

Dictionary design for text image compression with JBIG2

The JBIG2 standard for lossy and lossless bi-level image coding is a very flexible encoding strategy based on pattern matching techniques. This paper addresses the problem of compressing text images with JBIG2. For text image compression, JBIG2 allows two encoding strategies: SPM and PM&S. We compare in detail the lossless and lossy coding performance using the SPM-based and PM&S-based JBIG2, i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007